10 research outputs found

    Towards robust real-world historical handwriting recognition

    Get PDF
    In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data

    A high-performance word recognition system for the biological fieldnotes of the Natuurkundige Commissie

    Get PDF
    In this research, a high word-recognition accuracy was achieved using an e-Science friendly deep learning method on a highly multilingual data set. Deep learning requires large training sets. Therefore, we use an auxiliary data set in addition to the target data set which is derived from the collection Natuurkundige Commissie, years 1820-1850. The auxiliary historical data set is from another writer (van Oort). The method concerns a compact ensemble of Convolutional Bidirectional Long Short-Term Memory neural networks. A dual-state word-beam search combined with an adequate label-coding scheme is used for decoding the connectionist temporal classification layer. Our approach increased the recognition accuracy of the words that a recognizer has never seen, i.e., out-of-vocabulary (OOV) words with 3.5 percentage points. The use of extraneous training data increased the performance on in-vocabulary words by 1 pp. The network architectures in an ensemble are generated randomly and autonomously such that our system can be deployed in an e-Science server. The OOV capability allows scholars to search for words that did not exist in the original training set.</p

    Correction to:A Limited-size ensemble of homogeneous CNN/LSTMS for high-performance word classification (Neural Computing and Applications, (2021), 33, 14, (8615-8634), 10.1007/s00521-020-05612-0)

    Get PDF
    In the original publication of this article, in Table聽11 and Fig.聽9, there was an error in the calculation of the weighted average of the word-accuracy values. The correct figure and table results are provided in this erratum report and turned out to be slightly higher. These weighted-average rates are dominated by the large KdK data set and are not the focus of the interpretation of the results: The differences within the individual data sets are more important to understand the effects of the conditions, i.e., dictionary size and ensemble application. Therefore, the miscalculation has no effect on the Discussion section. (Figure presented.) Comparison of the effect of the two label-coding schemes (Plain vs Extra-separator) and dictionary application on the single architecture and ensemble voting on the RIMES, the KdK, and GW data sets showing the weighted average word accuracy taking test-set sizes into account. Averaging was done over sets Weighted average of word accuracy (%) on the RIMES, KdK and GW data sets, using the dual-state word-beam search applying the Concise dictionary and the Extra-separator label-coding scheme, for the two CTC methods and single vs ensemble voting. Averaging was carried out over sets CTC decoder Framework Single Ensemble Best path 89.1 92.2 Dual-state word-beam search 96.2 97.0 The raw counts can be found in the Zenodo repository, "Erratum to: A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification, doi: 10.1007/s00521-020-05612-0", ().</p

    How to limit label dissipation in neural-network validation: Exploring label-free early-stopping heuristics

    Get PDF
    In recent years, deep learning (DL) has achieved impressive successes in many application domains, including Handwritten-Text Recognition. However, DL methods demand a long training process and a huge amount of human-based labeled data. To address these issues, we explore several label-free heuristics for detecting the early-stopping point in training convolutional-neural networks: (1) Cumulative Distribution of the standard deviation of kernel weights (SKW); (2) the moving standard deviation of SKW, and (3) the standard deviation of the sum of weights over a window in the epoch series. We applied the proposed methods to the common RIMES and Bentham data sets as well as another highly challenging historical data set. In comparison with the usual stopping criterion which uses labels for validation, the label-free heuristics are at least 10 times faster per epoch when the same training set is used. The use of alternative stopping heuristics may require additional epochs, however, never requiring the original computing time. The character error rate (%) on the test set of the label-free heuristics is about a percentage point less in comparison to the usual stopping criterion. The label-free early-stopping methods have two benefits: They do not require a computationally intensive evaluation of a validation set per epoch and all labels can be used for training, specifically benefitting the underrepresented word or letter classes

    Improving the robustness of LSTMs for word classification using stressed word endings in dual-state word-beam search

    Get PDF
    In recent years, long short-term memory neural networks (LSTMs) followed by a connectionist temporal classification (CTC) have shown strength in solving handwritten text recognition problems. Such networks can handle not only sequence variability but also geometric variation by using a convolutional front end, at the input side. Although different approaches have been introduced for decoding activations in the CTC output layer, only limited consideration is given to the use of proper label-coding schemes. In this paper, we use a limited-size ensemble of end-to-end convolutional LSTM Neural Networks to evaluate four label-coding schemes. Additionally, we evaluate two CTC search techniques: Best-path search vs dual-state word-beam search (DSWBS). The classifiers in the ensemble have comparable architectures but variable numbers of hidden units. We tested the coding and search approaches on three datasets: A standard benchmark IAM dataset (English) and two more difficult historical handwritten datasets (diaries and field notes, highly multilingual). Results show that stressing the word endings in the label-coding scheme yields a higher performance, especially for DSWBS. However, stressing the start-of-word shapes with a token appears to be disadvantageous

    A high-performance word recognition system for the biological fieldnotes of the Natuurkundige Commissie

    No full text
    In this research, a high word-recognition accuracy was achieved using an e-Science friendly deep learning method on a highly multilingual data set. Deep learning requires large training sets. Therefore, we use an auxiliary data set in addition to the target data set which is derived from the collection Natuurkundige Commissie, years 1820-1850. The auxiliary historical data set is from another writer (van Oort). The method concerns a compact ensemble of Convolutional Bidirectional Long Short-Term Memory neural networks. A dual-state word-beam search combined with an adequate label-coding scheme is used for decoding the connectionist temporal classification layer. Our approach increased the recognition accuracy of the words that a recognizer has never seen, i.e., out-of-vocabulary (OOV) words with 3.5 percentage points. The use of extraneous training data increased the performance on in-vocabulary words by 1 pp. The network architectures in an ensemble are generated randomly and autonomously such that our system can be deployed in an e-Science server. The OOV capability allows scholars to search for words that did not exist in the original training set
    corecore